Picture for Yingshui Tan

Yingshui Tan

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Add code
Jun 01, 2026
Viaarxiv icon

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Add code
Jun 01, 2026
Viaarxiv icon

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Add code
May 31, 2026
Viaarxiv icon

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Add code
Apr 08, 2026
Viaarxiv icon

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Add code
Apr 03, 2026
Viaarxiv icon

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Add code
Jan 16, 2026
Viaarxiv icon

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Add code
Jan 08, 2026
Viaarxiv icon

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Add code
Dec 31, 2025
Viaarxiv icon

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Add code
Dec 18, 2025
Figure 1 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 2 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 3 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 4 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Viaarxiv icon

IFEvalCode: Controlled Code Generation

Add code
Jul 30, 2025
Viaarxiv icon